Delta-melspectra features for noise robustness to DNN-based ASR systems
نویسندگان
چکیده
Deep-neural-networks (DNNs) have significantly improved automatic speech recognition (ASR) accuracy over a range of speech scenarios. However noise-robustness is still a challenge to DNNs, where compared to clean, accuracy degrades significantly for noisy environments. Many of the current DNN-based ASR engines use log-MelSpectra features, along with features from temporal-difference in delta and delta-delta features. In this work we introduce delta-MelSpectra features to seek significant gains for DNNs in noisy environments, where we demonstrate that temporal-difference directly in MelSpectra domain can provide superior noise-robust features. We validate our delta-MelSpectra features over a multistyle trained DNN-ASR system; we tested on a large scale WindowsPhone client data, and obtained 17% and 12% relative reduction in word-error-rate (WER) for noisy and clean environments, respectively.
منابع مشابه
Exploring Low-Dimensional Structures of Modulation Spectra for Robust Speech Recognition
Developments of noise robustness techniques are vital to the success of automatic speech recognition (ASR) systems in face of varying sources of environmental interference. Recent studies have shown that exploring low-dimensional structures of speech features can yield good robustness. Along this vein, research on low-rank representation (LRR), which considers the intrinsic structures of speech...
متن کاملImproved Automatic Speech Recognition Using Subband Temporal Envelope Features and Time-Delay Neural Network Denoising Autoencoder
This paper investigates the use of perceptually-motivated subband temporal envelope (STE) features and time-delay neural network (TDNN) denoising autoencoder (DAE) to improve deep neural network (DNN)-based automatic speech recognition (ASR). STEs are estimated by full-wave rectification and low-pass filtering of band-passed speech using a Gammatone filter-bank. TDNNs are used either as DAE or ...
متن کاملListening in the Dips: Comparing Relevant Features for Speech Recognition in Humans and Machines
In recent years, automatic speech recognition (ASR) systems gradually decreased (and for some tasks closed) the gap between human and automatic speech recognition. However, it is unclear if similar performance implies humans and ASR systems to rely on similar signal cues. In the current study, ASR and HSR are compared using speech material from a matrix sentence test mixed with either a station...
متن کاملAn Investigation on the Use of i-Vectors for Robust ASR
In this paper we propose two different i-vector representations that improve the noise robustness of automatic speech recognition (ASR). The first kind of i-vectors is derived from “noise only” components of speech provided by an adaptive MMSE denoising algorithm, the second variant is extracted from mel filterbank energies containing both speech and noise. The effectiveness of both these repre...
متن کاملUncertainty Decoding with Adaptive Sampling for Noise Robust DNN-Based Acoustic Modeling
Although deep neural network (DNN) based acoustic models have obtained remarkable results, the automatic speech recognition (ASR) performance still remains low in noise and reverberant conditions. To address this issue, a speech enhancement front-end is often used before recognition to reduce noise. However, the front-end cannot fully suppress noise and often introduces artifacts that are limit...
متن کامل